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Abstract —  The  encoding  of  information  about  the  outside 
world  in  the  temporal  activity  of  sensory  neurons  is  an  extremely 
complex  process  that  has  eluded  the  understanding  of  the  scien¬ 
tific  community  for  decades.  The  reconstruction  of  sensory  stim¬ 
uli  from  observed  neuronal  activity  provides  a  basis  within  which 
we  might  ascertain  the  nature  of  the  sensory  information  encoded 
by  the  cells.  We  present  a  decoding  strategy  for  predicting  the 
sensory  stimulus  from  the  neuronal  response  that  is  based  on  the 
mechanisms  of  encoding.  For  a  class  of  encoding  mechanisms 
characterized  by  a  linear  function  followed  by  a  memoryless  non¬ 
linearity,  referred  to  as  Wiener  systems,  the  Bayesian  estimator 
is  derived  from  the  transformational  properties  of  the  nonlinear¬ 
ity.  The  result  is  a  reconstruction  paradigm  in  which  the  ability 
to  predict  sensory  stimuli  from  the  neuronal  response  depends 
heavily  upon  how  well  the  encoding  process  has  been  character¬ 
ized,  and  thus  provides  a  measure  of  our  understanding  of  the 
underlying  physiological  process. 
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I.  Introduction 

The  reconstruction  of  sensory  inputs  from  recorded 
neural  activity  has  proved  to  be  an  invaluable  tool  in 
understanding  how  the  information  about  the  outside 
world  is  encoded  in  the  sensory  pathway.  We  and  oth¬ 
ers  have  shown  previously  that  a  surprising  amount  of 
detail  can  be  reconstructed  from  ensemble  neural  activ¬ 
ity  in  sensory  pathways  using  a  relatively  simple  linear 
reconstruction  technique  [1],  [2],  [3].  The  accepted  ap¬ 
proach  involves  the  correlation  of  the  neural  response 
with  the  sensory  stimulus  in  order  to  determine  the  opti¬ 
mal  “reverse-filter”  that  predicts  the  sensory  input  from 
the  neuronal  activity. 

Although  this  approach  has  been  useful  in  evaluating 
what  and  how  much  information  is  being  encoded  in  the 
neural  activity,  it  does  not  directly  utilize  the  knowledge 
of  the  underlying  encoding  mechanisms  and  therefore 
provides  little  or  no  insight  in  this  regard.  In  fact,  the 
dominant  belief  is  that  the  decoding  mechanism  can  re¬ 
main  linear  even  if  the  encoding  process  is  highly  non¬ 
linear  in  nature,  although  there  has  been  no  rigorous 
support  for  this  claim.  Recent  work  in  place  cell  encod¬ 
ing  in  the  hippocampus  has  begun  to  shed  some  light  on 
the  utility  of  the  encoding/decoding  approach  [4],  [5], 
but  this  has  yet  to  become  the  mainstream  perspective. 

Due  to  the  inherent  nonlinearity  of  encoding  contin¬ 


uous  sensory  information  with  a  discrete  neuronal  pro¬ 
cess,  no  neural  system  is  purely  linear  in  its  form.  Obvi¬ 
ously  general,  non-parametric  nonlinear  models  could 
be  used  to  represent  the  encoding  process.  However, 
many  early  sensory  systems  are  well  characterized  as 
a  linear  system  followed  by  a  static,  memoryless  non¬ 
linearity.  This  type  of  cascade  system  has  historically 
been  referred  to  as  a  Wiener  system  or  LN  system,  and 
is  structurally  much  more  simple  than  the  correspond¬ 
ing  general  Wiener  series  expansion  that  would  be  nec¬ 
essary  to  characterize  the  same  dynamics.  The  result 
is  that  the  output  is  representative  of  the  neuronal  firing 
rate,  rather  than  the  discrete  events  that  are  observed  ex¬ 
perimentally.  A  common  perspective  is  that  the  output 
of  the  Wiener  system  is  the  rate  of  an  inhomogeneous 
Poisson  process  that  results  in  the  stochasticity  of  the 
observed  event  times.  Given  such  an  encoding  mech¬ 
anism,  the  question  then  remains  as  to  how  we  might 
decode  the  sensory  input  from  the  observed  neuronal 
activity,  and  what  limitations  might  the  encoding  pro¬ 
cess  place  on  the  prediction. 

II.  The  Wiener  System 

The  Wiener  system  is  a  class  of  nonlinear  systems 
in  which  a  linear  dynamical  system  is  followed  by  a 
static,  memoryless  nonlinearity,  as  shown  in  Figure  1. 
The  Wiener  system  has  been  widely  utilized  to  describe 

Fig.  1.  Wiener  System 


the  relationship  between  the  stimulus  and  the  firing  rate 
of  a  neuron  [6],  [7],  [8],  In  contrast  to  the  complex 
nature  of  a  higher  order  Wiener  kernel  representation, 
the  Wiener  system  provides  a  relatively  simple  means 
for  describing  the  inherent  nonlinearity  in  neural  en¬ 
coding.  A  simple  half-wave  rectification  achieves  the 
non-negative  characteristics  of  the  firing  rate,  and  has 
also  been  widely  used  as  the  static  nonlinearity  in  the 
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encoding  process. 

We  can  express  the  dynamics  of  the  system  shown 
in  Figure  1  as  r  =  /( g  *  s  +  e),  where  r  £  WNx  1  is  the 
firing  rate  of  the  neuron  over  N  time  steps,  s  £  M.Nx  1 
is  the  corresponding  sensory  stimulus,  g  £  MLx  1  is  the 
first  order  Wiener  kernel,  e  £  KA,xl  is  a  white  Gaus¬ 
sian  noise  process,  /( • )  is  the  static  nonlinearity,  and  * 
represents  convolution. 

It  has  been  shown  previously  that  the  linear  block 
of  the  system  can  be  estimated  as  g  =  a<t>~  1  bv,-,  where 
<f>„  £  RLxL  is  the  input  auto-covariance  Toeplitz  ma¬ 
trix,  <f>s/-  £  ]Rix  1  is  the  cross-  covariance  between  the  in¬ 
put  and  the  output,  and  a  £  M  is  a  scaling  factor  [9].  The 
scaling  factor  is  a  result  of  the  static  non-linearity,  and 
is  identically  2  when  the  non-linearity  is  a  half-wave 
rectification  (Stanley,  unpublished). 

III.  Decoding  from  the  Wiener  System 

Current  approaches  to  decoding  are  based  on  tech¬ 
niques  that  utilize  the  correlation  between  the  stimulus 
and  response,  normalized  by  the  correlation  structure 
in  the  response.  In  this  paper,  we  will  refer  to  this  ap¬ 
proach  as  the  “reverse  filter”  approach,  since  the  output 
is  essentially  treated  as  an  input,  and  vice  versa.  Al¬ 
though  this  perspective  provides  a  tool  for  reconstruct¬ 
ing  sensory  inputs  from  recorded  neural  activity  [2], 
[3],  it  makes  no  direct  reference  to  the  underlying  en¬ 
coding  mechanisms. 

Alternatively,  knowledge  of  the  encoding  strategy 
can  instead  lead  us  to  questions  concerning  the  decod¬ 
ing  of  sensory  stimuli  from  neural  activity  based  on  the 
nature  of  the  encoding  process.  The  decoding  prob¬ 
lem  involves  the  estimation  of  the  input  s  from  the 
recorded  neuronal  response,  given  the  encoding  strat¬ 
egy.  For  the  Wiener  systems  described  above,  this  as¬ 
sumes  prior  identification  of  the  linear  stage  utilizing 
an  independent  data  set,  although  the  basic  premise  de¬ 
scribed  here  applies  to  more  general  model  structures. 
Within  a  Bayesian  framework,  we  pose  this  problem  as: 

smmse  =  £{s|r}  (1) 

The  optimal  estimator,  that  minimizes  the  Bayesian 
mean-square  error  (MMSE),  is  the  expected  value  of  the 
stimulus  conditioned  on  the  response.  Embedded  in  the 
conditional  density  is  the  underlying  encoding  mecha¬ 
nism,  determined  independently.  For  jointly  Gaussian 
processes,  the  problem  reduces  to  a  well  known  rela¬ 
tionship.  In  the  problem  presented  here,  we  will  restrict 
the  input  s  and  noise  e  to  Gaussian  white  processes,  but 
s  and  r  will  not  be  jointly  Gaussian  due  to  the  effect  of 
the  static  nonlinearity.  We  can,  however,  determine  the 
conditional  density  p(s|r).  In  the  general  case,  in  order 


to  determine  Zs  {s|r},  we  must  perform  the  following 
integration: 


£{s|r} 


/: 


sp(s\r)ds 


(2) 


where  p(s|r)  is  the  conditional  density.  We  can  write 
the  conditional  density  as: 


p{s\r)  = 


p{r\s)p{s) 
P(r ) 


(3) 


Given  the  stimulus  s,  the  responses  at  different  times 
are  independent,  giving: 


p(  r|s)  =  n^rai-) 

k 


(4) 


As  shown  in  the  appendix,  for  half-wave  rectification 
we  can  write: 


p(r[k]\s)  = 


1 


sjl Trrr2 

+ 


na; 

«(r[*]) 


'um 


/ 


o 


e 


dx  (5) 


where: 


.% O-  J-oo 


f  M  =  E  s[m]s[k  - 1 

m= 0 


U(r[k])  is  the  unit  step  function,  resulting  in  1  for 
r[k\  >  0,  zero  else,  and  S(  • )  is  the  Dirac  delta  function. 
The  density  of  the  stimulus  is  p(s)  ~  A^(0,o^7),  com¬ 
pleting  the  numerator  of  Equation  3.  The  denomina¬ 
tor  is  simply  a  scaling  term,  and  can  be  obtained  from 
the  integration  of  the  numerator  over  s.  The  Bayesian 
estimator  can  then  be  obtained  from  the  integration  in 
Equation  2.  However,  the  estimation  then  requires  an 
/V-dimensional  integration,  which  is  impractical  for  any 
non-trivial  data  length. 


An  alternative  to  the  MMSE  estimator  is  the  maxi¬ 
mum  a  posteriori  (MAP)  estimator,  which  instead  in¬ 
volves  a  maximization  of  the  conditional  density: 

s map  =  argmaxp(s|r)  =  argmaxp(r|s)p(s)  (6) 

S  S 

which  now  is  an  (V-dimensional  optimization  problem, 
and  therefore  becomes  tractable  for  relatively  small 
data  sets,  whereas  the  corresponding  (V-dimensional  in¬ 
tegration  is  not  feasible.  For  larger  A,  the  data  sets  can 
be  segmented  for  computational  efficiency.  Numeri¬ 
cal  searches  were  implemented  using  a  gradient  search 
method  (quasi-newton). 

So  we  now  have  two  alternate  methods  by  which 
we  may  reconstruct,  or  decode,  sensory  inputs  from 
recorded  neuronal  responses:  the  reverse  filter  ap¬ 
proach  and  the  Bayesian  MAP  approach.  In  subsequent 
analyses,  we  will  compare  the  two  techniques. 
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IV.  Results 

First  consider  a  static  case,  where  the  linear  system  g 
is  a  scalar  value,  or  equivalently  proportional  to  an  im¬ 
pulse  function.  For  the  example  shown  here,  the  gain  of 
the  static  linearity  is  positive.  The  output  of  the  linear 
stage  is  then  passed  through  a  half-wave  rectification, 
giving  r  =  f(gs  +  e).  Figure  2  shows  the  results  from 
this  simple  case.  We  can  see  that  for  the  case  when 


Fig.  2.  Wiener  System  with  Static  Linearity 


Results  from  reverse  filter  and  MAP  estimator  for  static  Wiener  sys¬ 
tem.  The  left  two  panels  show  the  actual  (thin)  and  reconstructed 
(thick)  stimuli  for  the  reverse  filter  approach  (top)  and  the  MAP  ap¬ 
proach  (bottom).  Scatter  plots  are  shown  on  the  right  of  actual  stim¬ 
ulus  value  versus  the  predicted  value  from  the  two  techniques. 


the  stimulus  is  positive,  the  two  estimates  are  nearly 
identical.  The  difference  in  the  two  estimates  is  evi¬ 
dent  when  the  stimulus  is  negative.  The  reverse  filter 
approach  provides  an  estimate  of  zero,  while  the  MAP 
estimator  yields  a  value  that  is  slightly  negative.  Run¬ 
ning  many  such  examples  revealed  that  the  MAP  esti¬ 
mator  provided  a  significantly  better  prediction  of  the 
stimulus  than  that  obtained  from  the  reverse  filter,  as 
shown  in  Figure  3.  In  these  plots,  data  points  above 


Fig.  3.  Summary  Statistics  for  Static  Case 


Reverse  Filter  Error  Reverse  Filter  Correlation 

Left  plot  shows  the  mean  square  error  of  the  reconstruction  for  the 
reverse  filter  approach  versus  that  for  the  MAP  approach  for  the  static 
Wiener  system  for  several  different  simulations.  Right  plot  shows 
the  same  for  the  correlation  between  actual  and  estimated  stimulus. 


the  line  indicate  that  the  measure  associated  with  the 
MAP  estimator  is  greater  and  vice  versa.  So,  for  ex¬ 
ample,  in  Figure  3,  the  MAP  estimator  provides  smaller 
prediction  error  and  larger  correlation. 

With  the  dynamic  case,  we  first  consider  an  entirely 
linear  system,  where  r  =  g  *  s  4-  e.  Again,  we  see  that 
the  MAP  estimator  provides  significantly  better  results 


than  those  obtained  from  the  reverse  filter  method,  as 
shown  in  Figure  4.  For  the  linear  case,  we  were  able 


Fig.  4.  Linear  Dynamical  System 


Linear  Dynamic  Reverse  Filter 


Results  from  reverse  filter  and  MAP  estimator  for  dynamic  linear 
system.  The  left  two  panels  show  the  actual  (thin)  and  reconstructed 
(thick)  stimuli  for  the  reverse  filter  approach  (top)  and  the  MAP  ap¬ 
proach  (bottom).  The  right  plots  show  the  corresponding  scatter 
plots  of  estimation  error  (top)  and  correlation  (bottom)  for  the  re¬ 
verse  filter  estimate  versus  the  MAP  estimate. 


to  compute  both  the  MMSE  estimator  and  the  MAP  es¬ 
timator  for  comparison,  and  found  that  they  were  not 
significantly  different. 

Finally,  the  results  from  the  dynamic  Wiener  system 
are  shown  in  Figure  5.  In  this  case,  the  response  is  the 


Fig.  5.  Dynamical  Wiener  System 


Reverse  Filter 


index 


0.2  0.4  0.6 

Reverse  Filter  Correlation 


Results  from  reverse  filter  and  MAP  estimator  for  dynamic  Wiener 
system.  The  two  panels  on  the  left  show  the  actual  (thin)  and  re¬ 
constructed  (thick)  stimuli  for  the  reverse  filter  approach  (top)  and 
the  MAP  approach  (middle).  The  right  plots  show  the  corresponding 
scatter  plots  of  estimation  error  (top)  and  correlation  (bottom)  for 
the  reverse  filter  estimate  versus  the  MAP  estimate. 


half-wave  rectified  output  of  the  linear  system,  and  is 
written  r  =  /( g  *  s  +  e).  Again  we  see  that  the  MAP  ap¬ 
proach  tends  to  outperform  the  reverse  filter  approach, 
although  the  effect  is  not  so  dramatic  here.  As  a  con¬ 
firmation  that  the  gradient  search  method  was  indeed 
providing  a  more  optimal  solution  in  the  MAP  sense, 
the  cost  function  given  in  6  was  evaluated  at  both  the 
MAP  estimate  for  s  and  the  reverse  filter  estimate  for  s 
(data  not  shown).  The  MAP  estimate  always  provided  a 
significantly  larger  p(r|s). 
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V.  Discussion 

We  have  shown  that  knowledge  of  the  mechanisms 
by  which  neurons  encode  sensory  information  can  sig¬ 
nificantly  enhance  the  process  of  decoding  the  sensory 
stimulus  from  the  recorded  neuronal  activity.  The  re¬ 
verse  filter  approach,  in  which  the  correlation  structure 
between  response  and  stimulus  is  used  to  predict  the 
stimulus  from  the  neuronal  response,  assumes  indepen¬ 
dence  between  the  response  and  the  stimulus  prediction 
error,  although  this  is  not  generally  the  case. 

The  Bayesian  estimator,  as  an  alternate  approach,  is 
based  on  the  underlying  conditional  density  functions 
between  stimulus  and  response.  The  difference  be¬ 
tween  the  reverse  filter  approach  and  the  Bayesian  ap¬ 
proach  is  perhaps  most  clearly  outlined  with  the  static 
case  with  the  half-wave  rectification  discussed  previ¬ 
ously.  The  reverse  filter  approach  provides  a  clean  esti¬ 
mate  of  the  corresponding  stimulus  for  positive  values 
of  response,  but  when  the  response  is  rectified  to  zero, 
the  predicted  stimulus  is  also  zero.  The  MAP  estima¬ 
tor,  however,  provides  a  negative  offset  for  a  rectified 
response,  essentially  relying  on  the  fact  that  we  know 
something  about  the  distribution  of  the  stimulus  even 
when  the  response  is  rectified,  and  yields  the  best  esti¬ 
mate  in  the  sense  of  maximizing  the  posterier  density. 
The  same  heuristic  argument  holds  for  the  dynamic 
case,  providing  us  with  an  improved  reconstruction  of 
the  sensory  stimulus  from  the  recorded  response. 


where  p(s\r)  is  the  conditional  density  of  the  stimulus 
given  the  response.  The  conditional  density  can  be  ex¬ 
pressed  as: 


P(s\r) 


p(r\s)p{s)  =  p(r\s)p{s) 
P(r )  f  p(r\s)p(s)ds 


Given  the  stimulus  s,  the  density  of  a  can  be  written  as: 


px\Ax)  = 


1 


:exp 


f  C*-gs)2) 

l  2a?  J 


If  the  static  nonlinearity  is  a  half-wave  rectification, 
then  the  density  of  the  response  given  the  stimulus  be¬ 
comes  [10],  [1 1]: 


px{r)U{r)  +f’Y(0)8(r) 


:exp  < 


jr-gs)2 

2o2 


(*-gs)2 

2a2e 


} 


dx 


where  U  ( r )  is  the  unit  step  function,  which  is  1  for  r  > 
0,  and  0  else,  and  Px(x)  is  the  cumulative  distribution 
function  for  a.  The  effect  of  the  half-wave  rectification 
is  to  concentrate  the  density  for  negative  values  of  input 
to  the  nonlinearity  at  0  in  the  density  of  the  output  of 
the  nonlinearity.  The  estimate  in  Equation  7  can  then 
be  evaluated  numerically  using  the  conditional  density 
p(r\s)  described  above. 


The  methodology  presented  here  holds  for  more  gen¬ 
eral  types  of  nonlinearities,  although  the  transformation 
of  density  functions  is  case  dependent.  The  assumption 
of  a  Gaussian  white  noise  stimulus  can  be  relaxed  for 
a  more  general  treatment  of  the  problem,  although  the 
assumption  of  additive  Gaussian  white  noise  is  critical 
for  our  current  derivation.  The  result  of  this  work  is 
a  step  towards  recovery  of  information  loss  due  to  the 
inherent  rectifying  properties  in  neuronal  encoding. 
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Appendix 

Consider  a  random  variable  r  that  is  a  nonlinear  func¬ 
tion  /(•)  of  a  random  variable  a,  which  is  the  sum 
of  random  variable  s  scaled  by  a  known  parameter  g 
with  an  additive  noise  term  e,  given  by  r  =  /(a)  = 
f(gs  +  e).  The  scaling  g  £  ffi  is  assumed  known,  as 
are  the  densities  ps(s)  and  pe{e).  The  stimulus  s  is  as¬ 
sumed  zero-mean  Gaussian  ~  £^(0,Oj ),  as  is  the  noise 
e  ~  A£( 0,0").  The  output  r  is  observed.  The  best  esti¬ 
mate  of  s,  given  the  observation  of  r,  is: 

/oo 

sp(s\r)ds  (7) 
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